Goto

Collaborating Authors

 ranking scheme


The Ranking Blind Spot: Decision Hijacking in LLM-based Text Ranking

Qian, Yaoyao, Zeng, Yifan, Jiang, Yuchao, Jain, Chelsi, Wang, Huazheng

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated strong performance in information retrieval tasks like passage ranking. Our research examines how instruction-following capabilities in LLMs interact with multi-document comparison tasks, identifying what we term the "Ranking Blind Spot", a characteristic of LLM decision processes during comparative evaluation. We analyze how this ranking blind spot affects LLM evaluation systems through two approaches: Decision Objective Hijacking, which alters the evaluation goal in pairwise ranking systems, and Decision Criteria Hijacking, which modifies relevance standards across ranking schemes. These approaches demonstrate how content providers could potentially influence LLM-based ranking systems to affect document positioning. These attacks aim to force the LLM ranker to prefer a specific passage and rank it at the top. Malicious content providers can exploit this weakness, which helps them gain additional exposure by attacking the ranker. In our experiment, We empirically show that the proposed attacks are effective in various LLMs and can be generalized to multiple ranking schemes. We apply these attack to realistic examples to show their effectiveness. We also found stronger LLMs are more vulnerable to these attacks. Our code is available at: https://github.com/blindspotorg/RankingBlindSpot


A Novel Ranking Scheme for the Performance Analysis of Stochastic Optimization Algorithms using the Principles of Severity

Chandrasekaran, Sowmya, Bartz-Beielstein, Thomas

arXiv.org Artificial Intelligence

Stochastic optimization algorithms have been successfully applied in several domains to find optimal solutions. Because of the ever-growing complexity of the integrated systems, novel stochastic algorithms are being proposed, which makes the task of the performance analysis of the algorithms extremely important. In this paper, we provide a novel ranking scheme to rank the algorithms over multiple single-objective optimization problems. The results of the algorithms are compared using a robust bootstrapping-based hypothesis testing procedure that is based on the principles of severity. Analogous to the football league scoring scheme, we propose pairwise comparison of algorithms as in league competition. Each algorithm accumulates points and a performance metric of how good or bad it performed against other algorithms analogous to goal differences metric in football league scoring system. The goal differences performance metric can not only be used as a tie-breaker but also be used to obtain a quantitative performance of each algorithm. The key novelty of the proposed ranking scheme is that it takes into account the performance of each algorithm considering the magnitude of the achieved performance improvement along with its practical relevance and does not have any distributional assumptions. The proposed ranking scheme is compared to classical hypothesis testing and the analysis of the results shows that the results are comparable and our proposed ranking showcases many additional benefits.


Improving Natural Language Understanding with Computation-Efficient Retrieval Representation Fusion

Wu, Shangyu, Xiong, Ying, Cui, Yufei, Liu, Xue, Tang, Buzhou, Kuo, Tei-Wei, Xue, Chun Jason

arXiv.org Artificial Intelligence

Retrieval-based augmentations that aim to incorporate knowledge from an external database into language models have achieved great success in various knowledge-intensive (KI) tasks, such as question-answering and text generation. However, integrating retrievals in non-knowledge-intensive (NKI) tasks, such as text classification, is still challenging. Existing works focus on concatenating retrievals to inputs as context to form the prompt-based inputs. Unfortunately, such methods require language models to have the capability to handle long texts. Besides, inferring such concatenated data would also consume a significant amount of computational resources. To solve these challenges, we propose \textbf{ReFusion} in this paper, a computation-efficient \textbf{Re}trieval representation \textbf{Fusion} with neural architecture search. The main idea is to directly fuse the retrieval representations into the language models. Specifically, we first propose an online retrieval module that retrieves representations of similar sentences. Then, we present a retrieval fusion module including two effective ranking schemes, i.e., reranker-based scheme and ordered-mask-based scheme, to fuse the retrieval representations with hidden states. Furthermore, we use Neural Architecture Search (NAS) to seek the optimal fusion structure across different layers. Finally, we conduct comprehensive experiments, and the results demonstrate our ReFusion can achieve superior and robust performance on various NKI tasks.


The Multi-modality Cell Segmentation Challenge: Towards Universal Solutions

Ma, Jun, Xie, Ronald, Ayyadhury, Shamini, Ge, Cheng, Gupta, Anubha, Gupta, Ritu, Gu, Song, Zhang, Yao, Lee, Gihun, Kim, Joonkee, Lou, Wei, Li, Haofeng, Upschulte, Eric, Dickscheid, Timo, de Almeida, José Guilherme, Wang, Yixin, Han, Lin, Yang, Xin, Labagnara, Marco, Rahi, Sahand Jamal, Kempster, Carly, Pollitt, Alice, Espinosa, Leon, Mignot, Tâm, Middeke, Jan Moritz, Eckardt, Jan-Niklas, Li, Wangkai, Li, Zhaoyang, Cai, Xiaochen, Bai, Bizhe, Greenwald, Noah F., Van Valen, David, Weisbart, Erin, Cimini, Beth A., Li, Zhuoshi, Zuo, Chao, Brück, Oscar, Bader, Gary D., Wang, Bo

arXiv.org Artificial Intelligence

Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyperparameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diverse biological experiments. The top participants developed a Transformer-based deeplearning algorithm that not only exceeds existing methods, but can also be applied to diverse microscopy images across imaging platforms and tissue types without manual parameter adjustments. This benchmark and the improved algorithm offer promising avenues for more accurate and versatile cell analysis in microscopy imaging. Cell segmentation is a fundamental task that is universally required for biological image analysis across a large number of different experimental settings and imaging modalities. For example, in multiplexed fluorescence image-based cancer microenvironment analysis, cell segmentation is the prerequisite for the identification of tumor sub-types, composition, and organization, which can lead to important biological insights [1]-[3]. However, the development of a universal and automatic cell segmentation technique continues to pose significant challenges due to the extensive diversity observed in microscopy images. This diversity arises from variations in cell origins, microscopy types, staining techniques, and cell morphologies. Recent advances [4], [5] have successfully demonstrated the feasibility of automatic and precise cellular segmentation for specific microscopy image types and cell types, such as fluorescence and mass spectrometry images [6], [7], differential interference contrast images of platelets [8], bacteria images [9] and yeast images [10], [11], but the selection of appropriate segmentation models remains a non-trivial task for non-expert users in conventional biology laboratories. Efforts have been made towards the development of generalized cell segmentation algorithms [9], [12], [13]. However, these algorithms were primarily trained using datasets consisting of gray-scale images and two-channel fluorescent images, lacking the necessary diversity to ensure robust generalization across a wide range of imaging modalities. For example, the segmentation models have struggled to perform effectively on RGB images, such as bone marrow aspirate slides stained with Jenner-Giemsa. Furthermore, these models often require manual selection of both the model type and the specific image channel to be segmented, posing challenges for biologists with limited computational expertise. Biomedical image data science competitions have emerged as an effective way to accelerate the development of cutting-edge algorithms [14], [15].


Towards Direct Comparison of Community Structures in Social Networks

Das, Soumita, Biswas, Anupam

arXiv.org Artificial Intelligence

Community detection algorithms are in general evaluated by comparing evaluation metric values for the communities obtained with different algorithms. The evaluation metrics that are used for measuring quality of the communities incorporate the topological information of entities like connectivity of the nodes within or outside the communities. However, while comparing the metric values it loses direct involvement of topological information of the communities in the comparison process. In this paper, a direct comparison approach is proposed where topological information of the communities obtained with two algorithms are compared directly. A quality measure namely \emph{Topological Variance (TV)} is designed based on direct comparison of topological information of the communities. Considering the newly designed quality measure, two ranking schemes are developed. The efficacy of proposed quality metric as well as the ranking scheme is studied with eight widely used real-world datasets and six community detection algorithms.